On The Importance of Execution Ordering in Graph-Based Distributed Machine Learning Systems
نویسندگان
چکیده
Execution of operations in distributed machine learning systems has largely ignored dependencies between communication and computation ops. In this paper, we make the case that model-aware ordering of operations at individual machines can decrease the step time of training iteration in distributed machine learning systems while also improving network utilization. The contributions of this work are: • We introduce a metric for quantitatively measuring the efficiency of ordering of ops (§1). • We propose an ordering heuristic for Model-Replica with Parameter Server systems (§2.1). • We chalk out a roadmap for developing fast heuristics for modelaware ordering of ops in Model-Replica systems with all-reduce synchronization (§2.2). • We evaluate our ordering mechanism on Model-Replica with Parameter Server on TensorFlow and show that the training efficiency can be improved by up to 78% through better ordering of tasks with 46% reduction in step time (§3). 1 PROBLEM DEFINITION Graph-Based Machine Learning Systems such as TensorFlow [1] and PyTorch [12] evolved as a response to the growing complexity of problems that artificial intelligence is trying to solve. In these systems, a machine learning model (model) is represented by a directed acyclic graph (DAG) with predefined operations (ops) as nodes and their data/control dependencies as edges. When a distributed model is sent for execution, each op is assigned a tag [1]. This tag determines the device on which the op with will run. Using the type of op, we can also infer the associated resource on the device: computation unit, or communication channel from a specific source. Thus, the tagging of ops in a distributed model allows to precisely determine the number and type of ops assigned to a given device. In addition, the DAG gives us the relative ordering of ops on a device, with multiple possible combinations. We show that the performance of the model can vary widely across the multiple feasible combinations. Our goal is to select and enforce execution orders with minimal makespan. First, we illustrate the significance of ordering with a simple example. In Figure 1a, there are two communication ops (read1,read2) and two computation ops (conv1,conv2) assigned to a device. While both r1 → r2 → c1 → c2 and r2 → r1 → c1 → c2 are valid orders topologically, as shown in figures 1b and 1c, the latter has the worst step time due to lack of overlap between transfer and computation. Next, we define the problem formally. An execution order is defined as a function thatmaps an op to a real number. Thismapping determines the relative ordering of ops on a given resource. Our goal is to determine an execution order that minimizes the makespan of the model. This problem of finding an order that minimizes the (a) DAG (b) Best Execution Order (c) Worst Execution Order Figure 1: Example of impact of ordering on performance makespan can be mapped to the job shop problem[9] which is known to be NP-complete. Given a performance oracle,Cost , which predicts the actual cost of running an op on a resource, the upper and lower boundaries of the makespan respectively are:
منابع مشابه
A new Shuffled Genetic-based Task Scheduling Algorithm in Heterogeneous Distributed Systems
Distributed systems such as Grid- and Cloud Computing provision web services to their users in all of the world. One of the most important concerns which service providers encounter is to handle total cost of ownership (TCO). The large part of TCO is related to power consumption due to inefficient resource management. Task scheduling module as a key component can has drastic impact on both user...
متن کاملEntropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملError assessment in man-machine systems using the CREAM method and human-in-the-loop fault tree analysis
Background and Objectives: Despite contribution to catastrophic accidents, human errors have been generally ignored in the design of human-machine (HM) systems and the determination of the level of automation (LOA). This paper aims to develop a method to estimate the level of automation in the early stage of the design phase considering both human and machine performance. Methods: A quantita...
متن کاملOptimization Task Scheduling Algorithm in Cloud Computing
Since software systems play an important role in applications more than ever, the security has become one of the most important indicators of softwares.Cloud computing refers to services that run in a distributed network and are accessible through common internet protocols. Presenting a proper scheduling method can lead to efficiency of resources by decreasing response time and costs. This rese...
متن کاملMachine learning based Visual Evoked Potential (VEP) Signals Recognition
Introduction: Visual evoked potentials contain certain diagnostic information which have proved to be of importance in the visual systems functional integrity. Due to substantial decrease of amplitude in extra macular stimulation in commonly used pattern VEPs, differentiating normal and abnormal signals can prove to be quite an obstacle. Due to developments of use of machine l...
متن کامل